This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年08月26日 11:56 by RonnyPfannschmidt, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| bytestrpickle.diff | valhallasw, 2013年12月06日 20:47 | review | ||
| bytestrpickle.diff | valhallasw, 2013年12月06日 23:22 | review | ||
| pickle_python2_str_as_bytes.diff | alexandre.vassalotti, 2013年12月07日 02:19 | review | ||
| Messages (32) | |||
|---|---|---|---|
| msg91966 - (view) | Author: (RonnyPfannschmidt) | Date: 2009年08月26日 11:56 | |
i just noticed that there are some slight differences of the bytestring/unicodestring pickles between python2/3 using the protocols 0, 1 and 2 the first things i noticed are: a str from python2 is unpickled as unicode in python3 (fails for byte strings that don't fit whats expected for unicode) a bytes instance from python3 is pickled as custom class in protocols <3 i'll write a script to try all combinations of protocols and string variations and transfer directions |
|||
| msg91967 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年08月26日 12:19 | |
Why are you reporting this here? If you think there is a bug, can you propose an alternative behavior that you would consider correct? The changes you mentioned are all deliberate. |
|||
| msg91970 - (view) | Author: (RonnyPfannschmidt) | Date: 2009年08月26日 12:42 | |
the basic behavior i want to see for all protocols <= 2 1. python 2 string maps to python3 byte-string 2. python 2 unicode maps to python3 string 3. python 3 string map to python 2 unicode 4. python 3 bytestring maps to python 2 string anything else is is confusing and may break for example one can't unpickle '\xFF' in python3 if it was pickled in python2 note that these changes seem irrelevant for protocol 3 as python2.x doesn't support it |
|||
| msg91978 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年08月26日 18:01 | |
> the basic behavior i want to see for all protocols <= 2 > > 1. python 2 string maps to python3 byte-string That would not be good. Many people create pickles in 2.x where the string type really represents characters, more often so than they want it to represent bytes. Giving them bytes on unpickling will likely cause more problems than the current approach. > 2. python 2 unicode maps to python3 string That's the case, right? > 3. python 3 string map to python 2 unicode That's also the case, AFAICT. > 4. python 3 bytestring maps to python 2 string Hmm. This may be indeed a mistake. Until r61467, bytes were saved with the (BIN)STRING code; not sure why this was changed. |
|||
| msg91980 - (view) | Author: (RonnyPfannschmidt) | Date: 2009年08月26日 18:18 | |
Since it breaks for anything non-ascii, its not that helpfull after all and since python2 strings are encoding-unaware there is no way to fix it. It might be preferable to supply unpicklers that are cappable of coercing if the user really wants wants coercing. yup > > > 3. python 3 string map to python 2 unicode > > That's also the case, AFAICT. yup > > > 4. python 3 bytestring maps to python 2 string > > Hmm. This may be indeed a mistake. Until r61467, bytes were saved > with the (BIN)STRING code; not sure why this was changed. Python 3 is indeed evil there. b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.' I'm convinced that a 1:1 mapping of python2 string from/to python3 bytestrings is the least surprising behaviour and will keep surprising errors away when needing to communicate between different python versions. It just has bitten me, and i suspect will will get others, too. Unpickle that completely fails in the face of encodings is not desirable at all. |
|||
| msg91998 - (view) | Author: (RonnyPfannschmidt) | Date: 2009年08月27日 08:13 | |
its even worse
python3:
>>> import pickle
>>> pickle.dumps(b'', protocol=2)
b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.'
python2.6:
>>> import pickle
>>> pickle.loads('\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.')
'[]'
|
|||
| msg92002 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年08月27日 13:53 | |
The problem with trying to solve the following issue: "a bytes instance from python3 is pickled as custom class in protocols <3" is that if we pickle bytes from Python 3 as a 2.x str in protocol <= 2, unpickling it using Python 3 will yield a str (unicode), not a bytes object. Therefore the whole chain (pickling then unpickling) will not be idempotent. |
|||
| msg92003 - (view) | Author: (RonnyPfannschmidt) | Date: 2009年08月27日 14:55 | |
unpickle of any non-ascii string from python2 will break the only way out would be to ensure text strings and a single defined encoding (at that point storing unicode strings in any case seems more practical) also byte-strings stored as python2 str would break and since i pass around binary strings as parts of objects, its just completely broken for me |
|||
| msg92012 - (view) | Author: (RonnyPfannschmidt) | Date: 2009年08月27日 19:15 | |
in case the actual behavior is not supposed to change how about a way to declare one wants exact 1:1 mapping between py2<>py3, so str<>bytes and unicode<>str will work for sure something like load/dump(..., encoding=bytes) just crossed my mind |
|||
| msg92014 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年08月27日 21:08 | |
> how about a way to declare one wants exact 1:1 mapping between py2<>py3, > so str<>bytes and unicode<>str will work for sure In a sense, that's already possible. Inherit from _Pickler/_Unpickler, and replace the dispatch dict with a different mapping. I wouldn't object to supporting this with an option, though, assuming it was properly documented and implemented for both pickle and _pickle (probably along with pickletools). |
|||
| msg92072 - (view) | Author: Gabriel Genellina (ggenellina) | Date: 2009年08月29日 22:04 | |
Note that this is also a documentation issue: "The pickle serialization format is guaranteed to be backwards compatible across Python releases." |
|||
| msg92592 - (view) | Author: (RonnyPfannschmidt) | Date: 2009年09月14日 07:04 | |
i'll try to add some tests now hopefully i can get rid of the implicit badness like trying to coerce bytes to unicode in unpickle and storing bytes as list in pickle for protocol < 3 |
|||
| msg153659 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年02月18日 23:37 | |
Any news on this? Just as a note, pickletools.py also does not reflect the current behaviour; pickle types STRING, BINSTRING and SHORT_BINSTRING are all defined with stack_after=[pystring]: [1, line 992] I(name='STRING', code='S', arg=stringnl, stack_before=[], stack_after=[pystring], proto=0, doc=(...) ) although the doc=... does describe it will be decoded, the object type of pystring is still defined as bytes: [1, line 747] pystring = StackObject( name='string', obtype=bytes, doc="A Python (8-bit) string object.") [1] http://hg.python.org/cpython/file/98df29d51e12/Lib/pickletools.py |
|||
| msg153686 - (view) | Author: Ronny Pfannschmidt (Ronny.Pfannschmidt) | Date: 2012年02月19日 09:32 | |
im unlikely to find the time to try and fix pickle/cpickle myself in the next few months |
|||
| msg153705 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年02月19日 15:49 | |
Last night, I hacked together a wrapper to do what loewis suggested [1]. It pickles bytes to str (for protocol <= 2), and unpickles str to bytes. If I (ever) get the build system and tests of python itself to work, I'll try and see if I can implement a nicer solution - at least for pickle.py. [1] https://github.com/valhallasw/py2/blob/master/bytestrpickle.py |
|||
| msg153707 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年02月19日 15:53 | |
> If I (ever) get the build system and tests of python itself to work, If you have any problems with that, don't hesitate to ask on python-dev (or see http://mail.python.org/mailman/listinfo/core-mentorship ) |
|||
| msg153718 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年02月19日 19:08 | |
OK, this is the pickle.py patch. A new parameter 'bytestr' has been added to both _Pickler and _Unpickler to toggle the pickle.string<=>bytes behaviour: _Pickler: IF protocol <= 2 AND bytestr=True THEN bytes are stored as STRING/SHORT_BINSTRING/BINSTRING ELSE (the old behaviour; obj for protocol <=2, else BINARY) _Unpickler: IF bytestr=True THEN STRING/SHORT_BINSTRING/BINSTRING are read as bytes ELSE they are read as str (old behaviour) I also extracted the decoding stuff from the three string reading functions to a single one. |
|||
| msg153719 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年02月19日 19:10 | |
P.S. (sorry for forgetting this in the original post ;-)) Both ./python -m test -G -v test_pickle and ./python test_bytestrpickle.py pass, but I have not run the entire test suite, as that takes ~90 minutes on my laptop.... The test script should of course be merged with test_pickle.py at some time.... |
|||
| msg154282 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年02月25日 19:36 | |
Ok, this is my first attempt at the Pickler part of the C implementation. I'll have to adapt the python implementation to match this one. All BytestrPicklerTests in test_bytestrpickle.py pass, and ./python -m test -G -v test_pickle passes. Comments on style etc. are very welcome. |
|||
| msg154662 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年02月29日 20:42 | |
Added tests in Lib/test format. After applying pickle.py.patch and BytestrPickler_c.diff, ./python -m test -v -m PyPicklerBytestrTests test_pickle returns 12 tests, no errors, while ./python -m test -v -m CPicklerBytestrTests test_pickle only passes test_dump_bytes_protocol_0 (test.test_pickle.CPicklerBytestrTests) ... ok test_dump_bytes_protocol_1 (test.test_pickle.CPicklerBytestrTests) ... ok test_dump_bytes_protocol_2 (test.test_pickle.CPicklerBytestrTests) ... ok test_dump_bytes_protocol_3 (test.test_pickle.CPicklerBytestrTests) ... ok and has 8 errors (as expected). |
|||
| msg154795 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年03月02日 20:35 | |
And a complete patch that implements the tests, the python implementation and the C implementation. I'm not completely happy with the code duplication in read_string/read_binstring/read_short_binstring C implementation, so that might be an improvement (however, there is already a lot of code duplication there at the moment). Again: comments would be very welcome... |
|||
| msg154832 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年03月03日 12:37 | |
OK, and now a version that's not broken... I forgot to initialize self->bytestr for PicklerObject/UnpicklerObject. *puts on the you-broke-the-build-hat*
Except for test_packaging.test_caches, this version passes all tests -- test_packaging.test_caches, which seems to fail because I make install'd python and installed {distribute,pip,setuptools,virtualenv}.
|
|||
| msg156166 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年03月17日 16:06 | |
Based on the discussion on python-dev [1], this is an updated implementation that uses encoding='bytes' to signal str->bytes behaviour. http://mail.python.org/pipermail/python-dev/2012-March/117536.html |
|||
| msg156167 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2012年03月17日 16:07 | |
...and the tests to go with that. |
|||
| msg205347 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2013年12月06日 03:42 | |
Could you provide a single patch with the implementation and the tests together? I will try to find some time this week to review this. |
|||
| msg205401 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2013年12月06日 20:47 | |
Hi Alexandre, Attached is a diff based on r87793:0c508d87f80b. Merlijn |
|||
| msg205412 - (view) | Author: Merlijn van Deen (valhallasw) * | Date: 2013年12月06日 23:22 | |
I have fixed most of the nits in this patch, except for: 1) the intermediate bytes object being created; inlining is an option, as storchaka suggested, but I'd rather have you decide what it should become before implementing it; 2) make clinic gives me ./python -E ./Tools/clinic/clinic.py --make Error in file "./Modules/_pickle.c" on line 6611: Checksum mismatch! Expected: bed0d8bbe1c647960ccc6f997b33bf33935fa56f Computed: 58dcccb705487695fec30980f566027bc68d9c69 make: *** [clinic] Error 255 and I have no clue how to fix that -- the clinic docs are sparse, to say the least; 3) The tests are still in their own test case; please decide between the two of you what is the best solution; 4) I have grouped the test cases: test_load_python2_str_as_bytes (which checks protocols 0, 1, and 2), test_load_python2_unicode_as_str and test_load_long_python2_str_as_bytes; 5) I have moved the commands to create the shown pickled versions from docstrings to comments. If you think they are not useful, I'll remove them, but I found them pretty useful while shortening the strings. |
|||
| msg205435 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2013年12月07日 02:19 | |
I cleaned up the patch. I will submit it tonight if there is no major objections. |
|||
| msg205436 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年12月07日 02:57 | |
How about updating the documentation as well? |
|||
| msg205440 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年12月07日 07:47 | |
And what about an issue mentioned in msg153659? |
|||
| msg205443 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年12月07日 09:09 | |
New changeset bd71352e950f by Alexandre Vassalotti in branch 'default': Issue #6784: Strings from Python 2 can now be unpickled as bytes objects. http://hg.python.org/cpython/rev/bd71352e950f |
|||
| msg205444 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2013年12月07日 09:12 | |
I fixed up the last few review comments and submitted the patch. Thank you for the help! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:52 | admin | set | github: 51033 |
| 2013年12月07日 09:12:55 | alexandre.vassalotti | set | status: open -> closed resolution: fixed messages: + msg205444 stage: patch review -> resolved |
| 2013年12月07日 09:09:38 | python-dev | set | nosy:
+ python-dev messages: + msg205443 |
| 2013年12月07日 07:47:30 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg205440 |
| 2013年12月07日 02:57:04 | pitrou | set | messages: + msg205436 |
| 2013年12月07日 02:19:37 | alexandre.vassalotti | set | files:
+ pickle_python2_str_as_bytes.diff messages: + msg205435 |
| 2013年12月06日 23:22:12 | valhallasw | set | files:
+ bytestrpickle.diff messages: + msg205412 |
| 2013年12月06日 20:47:25 | valhallasw | set | files:
+ bytestrpickle.diff messages: + msg205401 |
| 2013年12月06日 20:45:49 | valhallasw | set | files: - pickle_bytes_tests.diff |
| 2013年12月06日 20:45:48 | valhallasw | set | files: - pickle_bytes_code.diff |
| 2013年12月06日 20:45:47 | valhallasw | set | files: - pickle_bytestr.patch |
| 2013年12月06日 20:45:46 | valhallasw | set | files: - test_pickle.diff |
| 2013年12月06日 20:45:45 | valhallasw | set | files: - BytestrPickler_c.diff |
| 2013年12月06日 20:45:43 | valhallasw | set | files: - pickle.py.patch |
| 2013年12月06日 03:42:25 | alexandre.vassalotti | set | priority: normal -> high versions: + Python 3.4, - Python 2.7, Python 3.2, Python 3.3 messages: + msg205347 assignee: docs@python -> alexandre.vassalotti stage: patch review |
| 2013年02月15日 18:18:57 | flox | set | nosy:
+ flox |
| 2012年12月18日 07:02:23 | kmike | set | nosy:
+ kmike |
| 2012年03月17日 16:07:47 | valhallasw | set | files:
+ pickle_bytes_tests.diff messages: + msg156167 |
| 2012年03月17日 16:07:00 | valhallasw | set | files:
+ pickle_bytes_code.diff messages: + msg156166 |
| 2012年03月03日 12:37:53 | valhallasw | set | files: - pickle_bytestr.patch |
| 2012年03月03日 12:37:38 | valhallasw | set | files:
+ pickle_bytestr.patch messages: + msg154832 |
| 2012年03月03日 12:06:21 | valhallasw | set | files: - test_bytestrpickle.py |
| 2012年03月02日 20:35:17 | valhallasw | set | files:
+ pickle_bytestr.patch messages: + msg154795 |
| 2012年02月29日 20:43:00 | valhallasw | set | files:
+ test_pickle.diff messages: + msg154662 |
| 2012年02月25日 19:36:21 | valhallasw | set | files:
+ BytestrPickler_c.diff messages: + msg154282 |
| 2012年02月19日 19:10:29 | valhallasw | set | messages: + msg153719 |
| 2012年02月19日 19:08:10 | valhallasw | set | files:
+ pickle.py.patch keywords: + patch messages: + msg153718 |
| 2012年02月19日 19:03:21 | valhallasw | set | files: + test_bytestrpickle.py |
| 2012年02月19日 15:53:44 | pitrou | set | messages: + msg153707 |
| 2012年02月19日 15:49:20 | valhallasw | set | messages: + msg153705 |
| 2012年02月19日 09:32:27 | Ronny.Pfannschmidt | set | nosy:
+ Ronny.Pfannschmidt messages: + msg153686 |
| 2012年02月19日 08:37:00 | eric.araujo | set | versions: + Python 3.3, - Python 2.6, Python 3.1 |
| 2012年02月18日 23:37:07 | valhallasw | set | messages: + msg153659 |
| 2012年02月14日 21:44:55 | valhallasw | set | nosy:
+ valhallasw |
| 2011年02月21日 23:03:19 | jcea | set | nosy:
+ jcea |
| 2011年02月02日 15:48:15 | r.david.murray | set | nosy:
+ jdharper |
| 2011年02月02日 15:47:44 | r.david.murray | link | issue11099 superseder |
| 2010年10月29日 10:07:21 | admin | set | assignee: georg.brandl -> docs@python |
| 2009年09月14日 07:04:17 | RonnyPfannschmidt | set | messages: + msg92592 |
| 2009年08月29日 22:04:37 | ggenellina | set | nosy:
+ ggenellina, georg.brandl messages: + msg92072 assignee: georg.brandl components: + Documentation |
| 2009年08月28日 09:38:54 | RonnyPfannschmidt | set | title: byte/unicode pickle incompatibilities between python2 and and python3 -> byte/unicode pickle incompatibilities between python2 and python3 |
| 2009年08月27日 21:08:01 | loewis | set | messages: + msg92014 |
| 2009年08月27日 19:15:13 | RonnyPfannschmidt | set | messages: + msg92012 |
| 2009年08月27日 14:55:20 | RonnyPfannschmidt | set | messages: + msg92003 |
| 2009年08月27日 13:53:41 | pitrou | set | versions:
+ Python 2.7, Python 3.2 nosy: + alexandre.vassalotti, gvanrossum, pitrou messages: + msg92002 components: + Library (Lib), - None |
| 2009年08月27日 08:13:27 | RonnyPfannschmidt | set | messages: + msg91998 |
| 2009年08月26日 18:18:03 | RonnyPfannschmidt | set | messages: + msg91980 |
| 2009年08月26日 18:01:17 | loewis | set | messages:
+ msg91978 title: byte/unicode pickle incompatibilities between python2 and and python3 -> byte/unicode pickle incompatibilities between python2 and and python3 |
| 2009年08月26日 12:42:23 | RonnyPfannschmidt | set | messages: + msg91970 |
| 2009年08月26日 12:19:45 | loewis | set | nosy:
+ loewis messages: + msg91967 |
| 2009年08月26日 12:12:25 | RonnyPfannschmidt | set | title: bytw/unicode string incompatibilities between python2 and and python3 -> byte/unicode pickle incompatibilities between python2 and and python3 |
| 2009年08月26日 11:56:12 | RonnyPfannschmidt | create | |